1 Introduction

This document demonstrates how to perform clustering in R using the tidymodels framework. Clustering is an unsupervised learning technique that groups similar data points together based on their inherent characteristics. We will use the iris dataset for this demonstration.

2 Load Data

First, we load the necessary libraries and the iris dataset.

Code
library(tidyverse)
library(tidymodels)
library(factoextra)

data(iris)
iris_data <- iris %>% select(-Species)

3 K-Means Clustering

K-Means is a popular clustering algorithm. We will use it to group the iris data into 3 clusters.

Code
set.seed(123)
kmeans_model <- kmeans(iris_data, centers = 3, nstart = 25)

# Visualize the clusters
fviz_cluster(kmeans_model, data = iris_data)

4 Hierarchical Clustering

Hierarchical clustering is another common clustering method.

Code
# Calculate the distance matrix
dist_matrix <- dist(iris_data, method = "euclidean")

# Perform hierarchical clustering
hclust_model <- hclust(dist_matrix, method = "ward.D2")

# Visualize the dendrogram
fviz_dend(hclust_model, k = 3, # Cut in 3 groups
          cex = 0.5, # label size
          k_colors = c("#2E9FDF", "#00AFBB", "#E7B800"),
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE # Add rectangle around groups
          )

5 Conclusion

This document provided a brief overview of clustering in R using tidymodels. We demonstrated both K-Means and Hierarchical clustering on the iris dataset.